Add .tar.gz / .tgz archive support for extension, preset, and workflow installation#2395
Add .tar.gz / .tgz archive support for extension, preset, and workflow installation#2395Copilot wants to merge 20 commits into
Conversation
…installation Agent-Logs-Url: https://github.com/github/spec-kit/sessions/9fb9a8ea-0967-4baf-b95c-7101e423ff58 Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>
Agent-Logs-Url: https://github.com/github/spec-kit/sessions/9fb9a8ea-0967-4baf-b95c-7101e423ff58 Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>
Agent-Logs-Url: https://github.com/github/spec-kit/sessions/9fb9a8ea-0967-4baf-b95c-7101e423ff58 Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>
Agent-Logs-Url: https://github.com/github/spec-kit/sessions/9fb9a8ea-0967-4baf-b95c-7101e423ff58 Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>
Agent-Logs-Url: https://github.com/github/spec-kit/sessions/9fb9a8ea-0967-4baf-b95c-7101e423ff58 Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
| # Only allow regular files and directories. | ||
| if not (member.isreg() or member.isdir()): | ||
| raise error_class( | ||
| f"Non-regular file in archive: {member.name}" | ||
| ) |
| with __import__("unittest.mock", fromlist=["patch"]).patch.object( | ||
| __import__("pathlib", fromlist=["Path"]).Path, "cwd", return_value=project_dir | ||
| ): |
|
@copilot apply changes based on the comments in this thread |
…ock imports in workflow tests Agent-Logs-Url: https://github.com/github/spec-kit/sessions/c1fcc1ff-8766-4d97-90a5-368447980acf Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>
Applied both changes in commit 05798a9:
|
There was a problem hiding this comment.
Copilot's findings
Comments suppressed due to low confidence (2)
src/specify_cli/extensions.py:231
- In the Python 3.12+ branch,
safe_extract_tarball()validates members intosafe_members(and intentionally skips PAX/GNU metadata entries), but then callstf.extractall(dest_dir, filter="data")without restricting extraction tosafe_members. This makes extraction behavior inconsistent across Python versions and can reintroduce metadata entries (or other unexpected members) into the extraction step. Passmembers=safe_membersin the 3.12+ branch as well (and keep thefilter="data"hardening), or extract only the validated members explicitly.
# Extract — use the "data" filter on Python 3.12+ for extra hardening.
# On older versions pass only the pre-validated members so that no
# unvetted entry (added concurrently or via a race) slips through.
if sys.version_info >= (3, 12):
tf.extractall(dest_dir, filter="data") # type: ignore[call-arg]
else:
tf.extractall(dest_dir, members=safe_members) # noqa: S202 — validated above
src/specify_cli/init.py:3663
extension add --fromdetects archive format from the original URL and optionalContent-Type, but does not validate the final URL after redirects. An HTTPS URL can redirect to non-HTTPS, and format detection can be wrong if the redirect changes the effective filename/extension. Consider validatingresponse.geturl()against the HTTPS/localhost policy and runningdetect_archive_format()against that final URL.
with urllib.request.urlopen(from_url, timeout=60) as response:
if not archive_fmt:
content_type = response.headers.get("Content-Type", "")
archive_fmt = detect_archive_format(from_url, content_type)
archive_data = response.read()
- Files reviewed: 6/6 changed files
- Comments generated: 4
| _TAR_METADATA_TYPES = ( | ||
| tarfile.XHDTYPE, # PAX extended header | ||
| tarfile.XGLTYPE, # PAX global extended header | ||
| tarfile.SOLARIS_XHDTYPE, # Solaris PAX extended header | ||
| *tarfile.GNU_TYPES, # GNU longname / longlink / sparse | ||
| ) |
| archive_fmt = _det_fmt(from_url) | ||
| try: | ||
| with urllib.request.urlopen(from_url, timeout=60) as response: | ||
| zip_path.write_bytes(response.read()) | ||
| if not archive_fmt: | ||
| content_type = response.headers.get("Content-Type", "") | ||
| archive_fmt = _det_fmt(from_url, content_type) |
| # Detect archive format from URL; resolve via Content-Type when needed. | ||
| archive_fmt = detect_archive_format(download_url) | ||
|
|
||
| # Download the archive | ||
| try: | ||
| with self._open_url(download_url, timeout=60) as response: | ||
| zip_data = response.read() | ||
|
|
||
| zip_path.write_bytes(zip_data) | ||
| return zip_path | ||
| if not archive_fmt: | ||
| content_type = response.headers.get("Content-Type", "") | ||
| archive_fmt = detect_archive_format(download_url, content_type) | ||
| archive_data = response.read() |
| # Detect archive format from URL; resolve via Content-Type when needed. | ||
| archive_fmt = detect_archive_format(download_url) | ||
|
|
||
| try: | ||
| with self._open_url(download_url, timeout=60) as response: | ||
| zip_data = response.read() | ||
|
|
||
| zip_path.write_bytes(zip_data) | ||
| return zip_path | ||
| if not archive_fmt: | ||
| content_type = response.headers.get("Content-Type", "") | ||
| archive_fmt = detect_archive_format(download_url, content_type) | ||
| archive_data = response.read() |
|
@copilot apply changes based on the comments in this thread |
…r redirect-safe format detection and HTTPS re-check Agent-Logs-Url: https://github.com/github/spec-kit/sessions/739d3f73-200b-417a-8a86-134329200560 Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>
Applied all four changes in commit 1015ff2:
|
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
| # On older versions pass only the pre-validated members so that no | ||
| # unvetted entry (added concurrently or via a race) slips through. | ||
| if sys.version_info >= (3, 12): | ||
| tf.extractall(dest_dir, filter="data") # type: ignore[call-arg] |
|
@copilot apply changes based on the comments in this thread |
Fixed in commit — the Python 3.12+ branch now calls |
The extension, preset, and workflow download pipelines only accepted ZIP archives, blocking use of npm registries and CI artifact stores that serve tarballs natively.
Core utilities (
extensions.py)detect_archive_format(url, content_type="")— infers format from URL path extension (.zip,.tar.gz,.tgz) withContent-Typeheader fallback (application/gzip,application/x-gzip,application/x-tar+gzip)safe_extract_tarball(archive_path, dest_dir, error_class)— safe extraction with:..traversal, symlinks, hard links, devices, and FIFOsXHDTYPE,XGLTYPE,SOLARIS_XHDTYPE) and GNU metadata-only entries (GNUTYPE_LONGNAME,GNUTYPE_LONGLINK) are silently skipped — they carry no extractable payload and are emitted by many common archiving tools;GNUTYPE_SPARSEis intentionally not skipped because sparse entries carry a real file payload andisreg()returnsTruefor themsafe_memberslist toextractall()to ensure only vetted entries are extractedtarfile.data_filterfor extra OS-level protection, combined withmembers=safe_memberstarfile.TarError/OSErrorare caught and re-raised as the caller-suppliederror_classfor consistent error handlingBoth helpers are public (no underscore prefix) and imported directly by
presets.pyand__init__.py.Extensions & presets
install_from_zip()on both managers now detects archive format from the file extension and dispatches to ZIP or tarball extraction accordingly — existing callers are unaffecteddownload_extension()/download_pack()detect format from the download URL (orContent-Typefallback) and persist the archive with the correct extension (.zipor.tar.gz); unknown formats are rejected with a clear error rather than silently defaulting to ZIPresponse.geturl()as the canonical post-redirect URL, use it for Content-Type fallback format detection, and re-validate the final URL's scheme to guard against scheme-downgrade via redirects__init__.pycall sitesextension add --frompreset add --fromextension updatePath(extension).nameto prevent path traversalworkflow add(URL)workflow.ymlfrom archive when URL points to one; temp-file paths initialized before write to avoidUnboundLocalErroron disk-fullworkflow add(local).tar.gz/.tgz/.ziparchive files (case-insensitive detection)workflow add(catalog)A shared
_extract_workflow_yml(archive_path, fmt)helper handles root-level and single-nested-directory layouts for both formats, withtarfile.extractfile()handles properly closed via context managers.Tests
30 new tests across
test_extensions.py,test_presets.py, andtest_workflows.pycovering:test_extensions.py: format detection (URL + Content-Type), flat and nested tarball install, missing manifest errors, path traversal rejection, symlink rejectiontest_presets.py: flat and nested tarball install, missing manifest errors, path traversal rejection, symlink rejectiontest_workflows.py(TestWorkflowAddArchive, 9 CLI-level tests): local ZIP (flat/nested), local.tar.gz(flat/nested),.tgzalias, missingworkflow.ymlerror cases, URL-based archive download for both ZIP and tar.gz formats